Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Removing Operator specific handling during a StudyJob run #387

Merged
merged 2 commits into from
Feb 20, 2019

Conversation

johnugeorge
Copy link
Member

@johnugeorge johnugeorge commented Feb 17, 2019

During StudyJob run, there are no more operator specific code. The unstructured is used to create the resource, get the resource and get the status

The only remaining is the watch changes that has to be separately handled in #317. Current controller runtime code(0.1.7 and less) doesn't support dynamic watch. Refer: kubernetes-sigs/kubebuilder#422

Related: #341


This change is Reviewable

@johnugeorge
Copy link
Member Author

/assign @richardsliu @hougangliu

var state katibapi.State = katibapi.State_RUNNING
var cpTime *metav1.Time
switch w.Kind {
switch wkind.Kind {

case DefaultJobWorker:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this case still needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. But the string values are different currently. 'Complete' in https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/api/batch/v1/types.go#L170 vs 'Succeeded' in https://github.com/kubeflow/tf-operator/blob/master/pkg/apis/common/v1beta2/common_types.go#L121

In general, since 'job' is a built-in K8s Kind, should we care about it matching with our JobCondition? I feel that the only necessary condition is that all custom job operators share the same JobStatus type.

Or other option is, we can change the common JobConditionType string to match the batch Job's JobConditionType. But new developers might not be aware of this assumption and may break the logic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is ok for this version. One thing that we have discussed in the past is moving the common types into its own repo. If we go down that route, it will be easier to change then. What do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup. There is a discussion in kubeflow/mpi-operator#92 (comment)

if err := r.Client.Get(context.TODO(), nname, &job); err != nil {
log.Printf("Client Get error %v for %v", err, nname)
return WorkerStatus{}
}
if job.Status.Active == 0 && job.Status.Succeeded > 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like we should be able to use the similar logic here to check for job completion: https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/api/batch/v1/generated.proto#L158

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see above comment

@richardsliu
Copy link
Contributor

/lgtm

jobStatus := commonv1beta1.JobStatus{}
err := runtime.DefaultUnstructuredConverter.FromUnstructured(statusMap, &jobStatus)
if err != nil {
log.Printf("Error in converting unstructured to status:%v ", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/status:%v/status: %v

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return WorkerStatus{}

@k8s-ci-robot k8s-ci-robot removed the lgtm label Feb 20, 2019
@hougangliu
Copy link
Member

/lgtm

@hougangliu
Copy link
Member

/lgtm
/approve

@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hougangliu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit 8a89b9e into kubeflow:master Feb 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants